Explainable deep learning for insights in El Niño and river flows

The El Niño Southern Oscillation (ENSO) is a semi-periodic fluctuation in sea surface temperature (SST) over the tropical central and eastern Pacific Ocean that influences interannual variability in regional hydrology across the world through long-range dependence or teleconnections. Recent research has demonstrated the value of Deep Learning (DL) methods for improving ENSO prediction as well as Complex Networks (CN) for understanding teleconnections. However, gaps in predictive understanding of ENSO-driven river flows include the black box nature of DL, the use of simple ENSO indices to describe a complex phenomenon and translating DL-based ENSO predictions to river flow predictions. Here we show that eXplainable DL (XDL) methods, based on saliency maps, can extract interpretable predictive information contained in global SST and discover SST information regions and dependence structures relevant for river flows which, in tandem with climate network constructions, enable improved predictive understanding. Our results reveal additional information content in global SST beyond ENSO indices, develop understanding of how SSTs influence river flows, and generate improved river flow prediction, including uncertainty estimation. Observations, reanalysis data, and earth system model simulations are used to demonstrate the value of the XDL-CN based methods for future interannual and decadal scale climate projections.

The El Niño Southern Oscillation (ENSO) is a semi-periodic fluctuation in sea surface temperature (SST) over the tropical central and eastern Pacific Ocean that influences interannual variability in regional hydrology across the world through long-range dependence or teleconnections. Recent research has demonstrated the value of Deep Learning (DL) methods for improving ENSO prediction as well as Complex Networks (CN) for understanding teleconnections. However, gaps in predictive understanding of ENSO-driven river flows include the black box nature of DL, the use of simple ENSO indices to describe a complex phenomenon and translating DL-based ENSO predictions to river flow predictions. Here we show that eXplainable DL (XDL) methods, based on saliency maps, can extract interpretable predictive information contained in global SST and discover SST information regions and dependence structures relevant for river flows which, in tandem with climate network constructions, enable improved predictive understanding. Our results reveal additional information content in global SST beyond ENSO indices, develop understanding of how SSTs influence river flows, and generate improved river flow prediction, including uncertainty estimation. Observations, reanalysis data, and earth system model simulations are used to demonstrate the value of the XDL-CN based methods for future interannual and decadal scale climate projections.
The El Niño-Southern Oscillation (ENSO) is a primary mode of interannual weather variability around the globe. ENSO modulates flood timings in Africa 1 , interannual variability of flow in the Ganges, the Amazon, and the Congo rivers 2,3 , and has significant influences on regional climate and hydrologic patterns around the globe. A predictive understanding of ENSO is thus of economic and societal importance. However, and our ability to predict ENSO with physics-based numerical simulations or data-driven models at interannual, decadal, and multidecadal time horizons have remained relatively poor 4 , which has in turn hindered our ability to assess and leverage the predictability of ENSO's hydrometeorological effects.
Some challenges in ENSO forecasting may be traced back to data limitations, such as the relatively arbitrary rectangular regions that determine ENSO indices. Studies have suggested that ENSO is part of a larger system of interrelated SST oscillations which may co-impact regional hydrometeorology 5 . Further, our understanding of physical mechanisms 6 along with data-driven methods 7 suggest that the relationships between ENSO and river flows may be highly nonlinear. The resulting complexity of the earth system calls for methods that can leverage complete information content from global SST data and identify complex geographic dependence structures, which include both proximity-based dependence and long-range teleconnections. Figure 1 shows SST anomalies in year 2008 when there was a cool year (La Nina phenomenon), while Fig. S1a, S1b show SST anomalies in a warm year (El Niño) and a neutral year, respectively. The relationships between river flows and ENSO indices indicate the possibility of significant nonlinear dependency (Table S3 and Figs. S2, S3).
Commonly used methods to identify dependencies among climate variables include visual comparison 8 , correlation 9 , mutual information 7 , coefficient of determination 10 , and weights in (sparse) linear regression 11,12 . These methods often require heuristic expertise in selecting features and can be difficult to extend to more complex features such as three-dimensional spatiotemporal features. In the recent years, deep learning methods have seen preliminary success in climate science, meteorology, and hydrology, resulting in improved Oceanic Niño Index (ONI) in the Niño 3.4 region at the same time-period. The ONI data are from United States Climate Prediction Center (NOAA 2021). Warm (red) and cold (blue) periods show months that are higher than +0.5°C or lower than −0.5°C threshold for a minimum of five consecutive months. A warm/cold year is a year when warm/cold anomaly months dominate, and a neutral year is a year that is neither a warm nor a cold year. For Amazon, the river flow decreases during the warm period and increases during the cold period. However, the relations between Congo River flow and ONI are more complicated and not obvious.
predictive skills and the development of methods to investigate the spatiotemporal dependencies 13,14 . Furthermore, methods for interpretation and explanation of deep neural networks, such as saliency maps, can be adapted to climate problems to analyze relevant (SST) regions resulting in understandable predictive information for regional climate and hydrology. Simonyan et al. 15 initially proposed the saliency map method as a visualization technique to explain the neural network function mapping, specifically, the extent to which inputs contribute to network output. Due to their effectiveness, explainable deep learning methods have been widely applied to the geosciences and especially to understand climate science and translate to impacts, for example, in spatial drought prediction 16 , satellite-based PM2.5 (air pollution) measurements 17 , crop yields 18 , species distribution models 19 , analysis of hailstorms 20 , hydro-climatological process modeling 21 , precipitation quality control 22 and climate drivers for global temperature 23 , and to localize pest insects in agricultural application 24 . Ham et al. 13 used a saliency map to analyze which regions contributed most in predicting the Niño 3.4 index using their neural network. Similarly, Mahesh et al. 25 applied saliency maps to find the important geographic regions for predicting Niño 3.4 index.
Here we address the problem of developing explainable predictive insights relating to the ENSO phenomenon. Our approach is based on an eXplainable Deep Learning (XDL) solution 15 that concurrently uses convolutional neural networks (CNN) for the prediction of river flow time series and saliency maps to explain the results by highlighting the relative importance of the spatiotemporal SST data. Our implicit hypothesis is that the XDL approach will lead to advances in predictive skills of river flows by considering the information content in the entire SST map, which should exceed the information content of ENSO indices. Furthermore, the XDL approach may lead to discoveries of robust SST teleconnections with each other and with river flows, which in turn would further explain the gains in predictive skills. We develop correlation-based metrics to quantify SST autocorrelations and teleconnections either owing to known proximity-based spatial correlations or owing to known long-range spatial dependence. The approaches are developed for proxy observations (reanalysis) datasets as well as earth system model (ESM) simulated Coupled Modeling Intercomparison Project phase 5 (CMIP5) data, both for assessments of historical skills as well as for use in future projections of teleconnections and river flows which represent a major gap in current generation earth system models [26][27][28] .

Results and discussion
We trained a CNN (Fig. S4) to predict monthly Amazon and Congo River flow from monthly SST derived from Earth System Models (ESM) and reanalysis data. We compared the skill to that of an ensemble of ML models, which predicted river flow using only indices calculated from the Niño 3.4 region (5°S-5°N, 170°W-120°W). These indices include mean SST over the Niño 3.4 region as observed and modeled in reanalysis and ESMs, as well as the Niño 3.4 index, an anomaly value. We found that models with access to the larger SST area (41.5°S-37.5°N, 50.5°E-9.5°W), with its full spatial and temporal provenance. outperformed models using the ENSO indices for prediction of three-month rolling mean river flows on both the Amazon on the Congo River (Fig. 2). The CNN ingesting more SST information also outperformed the historical climatological mean as a predictor of the Amazon and Congo River flows. This suggests the larger SST region was useful for capturing the phase and amplitude of annual river flow fluctuations as well as components of interannual variation. Predictive information on the interannual variability of the Amazon River flow was either not fully expressed in the ENSO region, or else was not captured by the ensemble of ML models (linear regression, lasso regression, ridge regression, elastic net regression, random forest regression, and feed forward dense neural network, or DNN, regression).
For SST as a predictor of river flow, seasonality was not removed to avoid potential information loss when delineating between anomaly and climatological states, which may be imprecise due randomfrequency climate variability with periods exceeding typical climatological time scales. Thus, the task of the SST models was to predict the temporal climatology of river flow values. With the Niño 3.4 index as a predictor of river flow anomaly, and seasonality was subsequently added back to the river flow value. For the Amazon River, we found that all models using climatological SST in the Niño 3.4 outperformed models using the Niño 3.4 anomaly.
The task of predicting Congo River flow was more challenging, perhaps influenced by the more extensive management of the Congo River basin compared to the Amazon River basin. However, predictions based on reanalysis model SST still resulted in lower RMSE than baseline predictions based on the historical climatological mean for the Congo River. In most cases, Congo River flow predictions based on Niño 3.4 anomaly value (index) outperformed predictions based on the climatological value of SST in the Niño 3.4 region. A full comparison of RMSE for river flow prediction using indices and larger area SST is presented in Table S2.
Prediction of river flow using zero lag (concurrent) SST data is relevant to predicting future river flow in climate projections. Mappings between observations of river flow can also give insight into the predictability of the system; deeper analysis of CNN performance and historical average (presented in Tables S4 and S5) suggests that the methods compare differently when different aspects of performance (linear/nonlinear correlation, seasonal/yearly, extremes, etc.) are examined. For example, the ESM + CNN model achieved a lower mean absolute error and stronger association with Amazon River flow by metrics of linear correlation than the climatological mean, but has a higher RMSE in spring, when Amazon River discharge generally peaks.
We used a cyclical saliency map method to identify important spatial areas for the network to make predictions of river flows (Fig. 3). From the saliency maps we discover that the predictive power of ESMs comes mainly from the ENSO and the Indian Ocean Dipole (IOD) regions, suggesting a strong link between these two phenomena and a co-impact on regional hydrology. Figure 3a shows that the dominant salient areas for Amazon River flow prediction are in tropical Pacific and Indian Oceans. Figure 3c shows similar patterns but with less strong and smaller salient areas for Congo River flow. When using reanalysis data (Fig. 3b, d), the saliency maps are much more diffused, suggesting that the CNN model does not pick up any strong relationships between the predictor and predictand. However, the presence of linear and nonlinear information content about river flow in global SST is confirmed by the maps in Fig. S6. The yearly cyclical saliency maps and seasonal saliency maps are also presented in Fig. S5-S8. Whereas saliency maps can be used to verify the physically reasonable relationships that are learned, our hypothesis can be confirmed by examining the degree to which known oceanic regions that correspond to the ENSO region, as well as oceanic regions that correlate with the ENSO region, are triggered by the saliency maps as contributors to the information content.
Complex network theory provides a complementary tool to investigate the short and long-distance relationships in earth systems, such as teleconnections associated with the ENSO phenomenon that are indicated by our results. We analyzed the correlation structure of global SST data by constructing degree maps for reanalysis and ESM SST (Fig. 4). We quantified temporal correlation by calculating Pearson's correlation coefficient between every pair of locations in the ocean. The degree of each geographical location is the number of edges connected to this location, where an edge exists if the correlation is larger than a threshold c 1 . We also set a second correlation threshold c 2 and distance threshold d to define a teleconnection. We define that there is a teleconnection between two locations if their distance is larger than d km and the correlation is larger than c 2 .
We find that ESM SST has high degree values over a large area, indicating that the SST are highly correlated through both proximitybased correlations and teleconnections. There are many teleconnections between tropical Pacific Ocean, Indian Ocean, and even Atlantic Ocean, and they are largely concentrated around the equator (Fig. 4a). The teleconnections remain strong when the correlation threshold is increased (Fig. 4c). This pattern is reflected in the histogram of edges, which shows the degree distribution (Fig. 4e, g). There are many edge counts for long distances, which demonstrate the multicollinearity between SST regions. In contrast, the histograms of edges for reanalysis data (Fig. 4b, d) show fewer long-distance connections for a low correlation threshold, and negligible long-distance connections with a high correlation threshold. These results indicate a weaker correlation structure in reanalysis SST compared to ESM SST, and are consist with recent literature indicating that ESMs tend to exhibit a stronger coupling than reanalysis or observations [29][30][31][32] . Extending these findings, a hypothesis for future studies by climate science and earth system modeling communities is that the coupling strength of ESM model components are usually stronger than those in observations or reanalysis, and that data-driven sciences may be able to quantify and bridge this gap.
Histograms of connection distance in each of ESMs indicate qualitative differences in the correlation structures of the models (Figs. S9, S10); some exhibit a single peak corresponding to proximity-based correlations (e.g. Fig. S9a), while others also exhibit clusters of longrange connections (e.g. Fig. S9f). Models also vary in the rapidity of decay of proximity-based correlations with increasing distance. These attributes of these plots indicate distinct spatiotemporal correlation structures among the climate models.
ENSO is a complex spatiotemporal process with global impacts on SST and the flows of large rivers globally, especially around the tropics and subtropics. In this work, we combined ML methods and interpretive techniques to obtain gains in predictive power and make discoveries about dependence structures and teleconnections in global SST data. Although researchers often analyze the relationship between ENSO indices and the other climate variables, our results indicate that information outside of the canonical ENSO region can help to predict regional hydrology better than some representations based on handselected features. They suggest that additional data and data-driven technologies could lead to a better understanding of mechanisms and the flow of causality in earth systems, as well as to inform climate for mean Reanalysis SSTs. b correlation threshold equal to 0.5 and 0.5 for degree and teleconnection. d Correlation threshold equal to 0.9 and 0.9 for degree and teleconnection. We show teleconnections with distance larger than 19,000 km and 15,000 km for ESM and Reanalysis SST, respectively. e.g. the histogram of edges using correlation threshold 0.5 and 0.9 for mean ESM SST. f, h The histogram of edges using correlation threshold 0.5 and 0.9 for mean Reanalysis SST. adaptation through augmented projections of river flow for future climate scenarios.

Methods
Flowcharts detailing the methodology are provided in Fig. S11. The processing, modeling, and evaluation steps are outlined for reanalysis data (Fig. S11a) and ESM data (Fig. S11b). The ensembling approach used to generate probabilistic river flow predictions is shown in Fig. S11c.

Datasets
We obtained monthly sea surface temperature datasets from ESM simulations and reanalysis models. The ESM datasets are downloaded from NASA Earth eXchange (NEX, https://registry.opendata.aws/ nasanex/, last access May 2021). From the full set of Coupled Model Intercomparison Project Phase 5 (CMIP5) ESMs by various institutes, we discard those which have some months missing, leaving 32 ESMs. The CMIP5 historical forcing experiment spans from January 1950 to December 2005, or 672 months in total. This ESM dataset covers the whole globe with a spatial resolution of 1°longitude by 1°latitude (approximately 100 km by 100 km) with longitudes range from 0.5°E to 359.5°E, and latitudes from 87.5°N to 87.5°S. The ESM names are shown in Table S1.
In addition to ESM simulation datasets, we also use reanalysis datasets which are combinations of sparse on-site observation with other sources (such as remote sensing and satellite imaging) to produce gridded data. It is common to use reanalysis data as the proxy of true observational data because the site-based observational data are very sparse and not gridded. We use three reanalysis datasets in the experiment as predictors: Hadley-OI SST dataset 33 , COBE SST dataset 34 and ERSSTV5 dataset 35 .
The merged Hadley-OI SST dataset (https://climatedataguide. ucar.edu/climate-data/merged-hadley-noaaoi-sea-surfacetemperature-sea-ice-concentration-hurrell-et-al-2008) is a combination of two reanalysis datasets: HadISST1 36 and NOAA OI.v2 37 . The HadISST1 dataset is derived gridded, bias-adjusted in situ observations, and the NOAA OI.v2 dataset combines in situ and satellitederived SST data. The resulting Hadley-NOAA-OI dataset contains monthly mean sea surface temperature from the year 1870 to 2020 with a spatial resolution of 1°longitude by 1°latitude.
The COBE SST dataset (https://climatedataguide.ucar.edu/ climate-data/sst-data-cobe-centennial-situ-observation-basedestimates) are centennial in situ observation-based estimation that combines SSTs from International Comprehensive Ocean-Atmosphere Data Set (ICOADS) 38 release 2.0, the Japanese Kobe collection and reports from ships and buoys. ICOADS is the most comprehensive archive of global marine surface climate observations available, but the data coverage is sparse and neither gridded nor corrected. These datasets were gridded using optimal interpolation. The resulting COBE dataset contains monthly mean sea surface temperature from 1891 to 2020 with a spatial resolution of 1°longitude by 1°latitude.
These datasets have different time spans and spatial resolutions. We performed preprocessing to align the coordinates, interpolate to the same spatial resolution by bilinear interpolation, and select the common time span. A minimal number of missing values were filled with 0, in a similar approach to the zero padding approach in machine learning, where a matrix is surrounded with zeroes to help preserve features at the image edges. After preprocessing, the resulting reanalysis input has 3 channels corresponding to the 3 reanalysis datasets described above with a spatial resolution of 1°longitude by 1°l atitude. We extract the region with latitude from 37.5°N to 42.5°S and longitude from 50.5°E to 0.5°W, roughly covering most of low latitude Pacific Ocean and Indian Ocean. The resulting input image size is 80 × 300 height by width.
The Niño 3.4 SST Index time series is anomaly monthly average SST in the region with latitude from 5°S to 5°N and longitude from 170°W to 120°W with the 1981-2010 mean removed. The data is generated by the NOAA Physical Sciences Laboratory using the HadISST1 dataset 36 .
The river flow dataset was obtained from UCAR (A. Dai 2017) and can be downloaded from UCAR Research Data Archive website (https://rda.ucar.edu/datasets/ds551.0/index.html, last accessed January 2021). The dataset contains monthly runoff (m 3 /month) for many rivers in the world. The record for Amazon River was observed in the downstream Amazon River at a station in Obidos, Brazil from December 1927 to October 2018, totally 1091 months available. The record for Congo River was measured at a station in Kinshasa, Congo from January 1903 to January 2011, totally 1296 months. We calculated the moving mean river flow using a moving window of length 3 months and used it as the smoothed river flow for the third month. We took the smoothing approach the reduction in noise resulted in more robust predictions across all models.
For both predictor (SST) and predictand (river flow) our monthly data span from January 1950 to December 2005. Of this total 672 months, we use the first 600 months as our training data, the following 36 months as our validation data to select the best parameters for the model, and the last 36 months (January 2003 to December 2005) as the test data. While our dataset is limited in size by the record length, in the future additional data, including discharge data from additional rivers, can be used to bolster the results.

Neural network model
The CNN used in this paper consists of 4 convolutional layers and 3 fully connected layers. The number of output channels for each convolutional layer is 32, 32, 64 and 64, respectively. They all have stride 1. The filter sizes in the first three layers are 3 × 3, and for the fourth layer, it is 1 × 1. All convolutional layers are followed by a ReLU activation and a 2D max pooling layer with size 2 × 2 and stride 2 × 2. For the fully connected layers, the number of output feature for each layer is 128, 64 and 1, respectively. The input image size is 80 × 300 × C with different number of channels C. For all ESMs as input, C = 32. For all reanalysis input, C = 3. For mean ESMs or mean reanalysis as input, C = 1. The network output is a scalar. We set the training batch size as 64 and use Adam optimizer with initial learning rate 5 × 10 −5 and weight decay 1 × 10 −4 .We use squared loss function and the network tries to minimize the loss function: 1 where T is the number of training samples, X t ∊R W × H × C is the tth input with width W, height H and number of channels C, y t is the tth ground truth target, w = {w 1 ,…,w L } is the set of weights from all layers. The network output f X t ,w À Á = f L ðf LÀ1 ð. . . f 1 ðX t ,w 1 ÞÞÞ, where f l ð:,w l Þ is the mapping function for the lth layer in the neural network. Predictive uncertainty was estimated as the standard deviation of five repeated CNN predictions with different learning rates.

Saliency map and cyclical saliency map (Cyclic-SM)
The saliency map for a CNN is the derivative of the network output y with respect to the input X : S = ∂y ∂X = ∂f ðX ,wÞ

∂X
(1), where S is the same size as the input 15 . The magnitude of elements S ijk in S reflects how important the corresponding input pixel X ijk (where i,j,k is the index of the width, height and channel of X) is to the output prediction. For climate variables viewed as images in different time frames, they usually exhibit some (irregular) periodicity in the time. We can utilize this property to enhance the saliency map by superimposing individual saliency maps to form a conglomerate saliency map. Specifically, we define the Cyclic-SM with a cycle M as: S c = 1 K + 1 P K k = 0 S t + kM = 1 K + 1 P K k = 0 ∂y t + kM ∂X t + kM (2), where K = TÀt M Ä Å is the number of individual saliency maps in the cycle.
The averaging nature of the Cyclic-SM makes it more robust to gradient fluctuation and noise compared to an ordinary saliency map. In addition, Cyclic-SMs are meaningful in a climate context. For example, for monthly data, M = 12 corresponds to a natural month cycle (January, February, …, December). And we further define seasonal and yearly Cyclic-SM as the sum of saliency maps of the corresponding months. We can calculate different Cyclic-SMs with different cycles depending on the specific purpose and climate data used. For example, we can get daily, monthly, seasonal, annual or other Cyclic-SMs to analyze the dependencies between climate variables in different time scales.